04. 梯度下降:代码
梯度下降:代码
之前我们看到一个权重的更新可以这样计算:
\Delta w_i = \eta \, \delta x_i
这里 error term \delta 是指
\delta = (y - \hat y) f'(h) = (y - \hat y) f'(\sum w_i x_i)
记住,上面公式中 (y - \hat y) 是输出误差,激活函数 f(h) 的导函数是f'(h) ,我们把这个导函数称做输出的梯度。
现在假设只有一个输出单元,我来把这个写成代码。我们还是用 sigmoid 来作为激活函数 f(h)。
# Defining the sigmoid function for activations
# 定义 sigmoid 激活函数
def sigmoid(x):
return 1/(1+np.exp(-x))
# Derivative of the sigmoid function
# 激活函数的导数
def sigmoid_prime(x):
return sigmoid(x) * (1 - sigmoid(x))
# Input data
# 输入数据
x = np.array([0.1, 0.3])
# Target
# 目标
y = 0.2
# Input to output weights
# 输入到输出的权重
weights = np.array([-0.8, 0.5])
# The learning rate, eta in the weight step equation
# 权重更新的学习率
learnrate = 0.5
# the linear combination performed by the node (h in f(h) and f'(h))
# 输入和权重的线性组合
h = x[0]*weights[0] + x[1]*weights[1]
# or h = np.dot(x, weights)
# The neural network output (y-hat)
# 神经网络输出
nn_output = sigmoid(h)
# output error (y - y-hat)
# 输出误差
error = y - nn_output
# output gradient (f'(h))
# 输出梯度
output_grad = sigmoid_prime(h)
# error term (lowercase delta)
error_term = error * output_grad
# Gradient descent step
# 梯度下降一步
del_w = [ learnrate * error_term * x[0],
learnrate * error_term * x[1]]
# or del_w = learnrate * error_term * x
Start Quiz:
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2, 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
### Calculate one gradient descent step for each weight
### Note: Some steps have been consolidated, so there are
### fewer variable names than in the above sample code
# TODO: Calculate the node's linear combination of inputs and weights
h = None
# TODO: Calculate output of neural network
nn_output = None
# TODO: Calculate error of neural network
error = None
# TODO: Calculate the error term
# Remember, this requires the output gradient, which we haven't
# specifically added a variable for.
error_term = None
# TODO: Calculate change in weights
del_w = None
print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)
import numpy as np
def sigmoid(x):
"""
Calculate sigmoid
"""
return 1/(1+np.exp(-x))
def sigmoid_prime(x):
"""
# Derivative of the sigmoid function
"""
return sigmoid(x) * (1 - sigmoid(x))
learnrate = 0.5
x = np.array([1, 2. 3, 4])
y = np.array(0.5)
# Initial weights
w = np.array([0.5, -0.5, 0.3, 0.1])
### Calculate one gradient descent step for each weight
### Note: Some steps have been consolidated, so there are
### fewer variable names than in the above sample code
# TODO: Calculate the node's linear combination of inputs and weights
h = np.dot(x, w)
# TODO: Calculate output of neural network
nn_output = sigmoid(h)
# TODO: Calculate error of neural network
error = y - nn_output
# TODO: Calculate the error term
# Remember, this requires the output gradient, which we haven't
# specifically added a variable for.
error_term = error * sigmoid_prime(h)
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
# but you've already calculated it once. You can make this
# code more efficient by calculating the derivative directly
# rather than calling sigmoid_prime, like this:
# error_term = error * nn_output * (1 - nn_output)
# TODO: Calculate change in weights
del_w = learnrate * error_term * x
print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)